Evaluation of Distance Measures Between Gaussian Mixture Models of MFCCs

نویسندگان

Jesper Højvang Jensen

Daniel P. W. Ellis

Mads Græsbøll Christensen

Søren Holdt Jensen

چکیده

In music similarity and in the related task of genre classification, a distance measure between Gaussian mixture models is frequently needed. We present a comparison of the KullbackLeibler distance, the earth movers distance and the normalized L2 distance for this application. Although the normalized L2 distance was slightly inferior to the Kullback-Leibler distance with respect to classification performance, it has the advantage of obeying the triangle inequality, which allows for efficient searching. 1. A Statistical Timbre Model We use the common approach of extracting mel-frequency cepstral coefficients (MFCCs) from a song, model them by a Gaussian mixture model (GMM) and use a distance measure between the GMMs as a measure of the musical distance between the songs [2, 4, 6]. 1.1 Mel-Frequency Cepstral Coefficients MFCCs are a compact, perceptually based representation of speech frames [3]. They are computed as follows: 1. Estimate the log-amplitude or log-power spectrum of 20– 30 ms of speech. 2. Sum the contents of neighboring frequency bins in overlapping bands distributed according to the mel-scale. 3. Compute the discrete cosine transform of the bands. 4. Discard high frequency coefficients from the cosine transform. 1.2 Gaussian Mixture Models We model the MFCCs from each song by a Gaussian mixture model (GMM): p(x)= K ∑ k=1 1 √ |2πΣk | exp ( −1 2(x−μk)TΣ−1 k (x−μk)), where K is the number of components. For K = 1, a closedform expression exists for the maximum-likelihood estimate of the parameters. For K > 1, the k-means algorithm and optionally the expectation-maximization algorithm are needed. 2. Distance Measures Between GMMs As distance measure between the GMMs, we have evaluated the symmetrized Kullback-Leibler distance, the earth movers distance and the normalized L2 distance. 2.1 Kullback-Leibler Distance The KL distance is given by dKL(p1, p2) = ∫ p1(x) log p1(x) p2(x)dx. (1) As the KL distance is not symmetric, we use a symmetrized version, dsKL(p1, p2) = dKL(p1, p2) + dKL(p2, p1). (2) For Gaussian mixtures, a closed form expression for dKL(p1, p2) only exists for K = 1. For K > 1, dKL(p1, p2) is estimated using stochastic integration or the approximation in [5]. 2.2 Earth Movers Distance The earth movers distance (EMD) is the minimum cost of changing one mixture into another when the cost of moving probability mass from component m in the first mixture to component n in the second mixture, cmn, is given [7, 4]. Let a1k be the weights of the Gaussians in p1(x), and a2k the weights of p2(x), then dEMD(p1, p2) is given by dEMD(p1, p2) = min∑ m ∑ n cmnfmn (3) subject to fij ≥ 0 (4) ∑

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Adapting the Structure of Audio Similarity Spaces

Today, among the best-performing audio-based music similarity measures are algorithms based on Mel Frequency Cepstrum Coefficients (MFCCs). In these algorithms, each music track is modelled as a Gaussian Mixture Model (GMM) of MFCCs. The similarity between two tracks is computed by comparing their GMMs. One drawback of this approach is that the distance space obtained this way has some undesira...

متن کامل

Learning the Semantics of Audio Signals

متن کامل

Convergence of latent mixing measures in nonparametric and mixture models

We consider Wasserstein distance functionals for assessing the convergence of latent discrete measures, which serve as mixing distributions in hierarchical and nonparametric mixture models. We clarify the relationships between Wasserstein distances of mixing distributions and f -divergence functionals such as Hellinger and Kullback-Leibler distances on the space of mixture distributions using v...

متن کامل

IMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL

Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we have learned Gaussian mixture model to the pixels of an image. The parameters of the model have estimated by EM-algorithm. In addition pixel labeling corresponded to each pixel of true image is made by Bayes rule. In fact, ...

متن کامل

Image Segmentation using Gaussian Mixture Model

Abstract: Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we used Gaussian mixture model to the pixels of an image. The parameters of the model were estimated by EM-algorithm. In addition pixel labeling corresponded to each pixel of true image was made by Bayes rule. In fact,...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Evaluation of Distance Measures Between Gaussian Mixture Models of MFCCs

نویسندگان

چکیده

منابع مشابه

Automatically Adapting the Structure of Audio Similarity Spaces

Learning the Semantics of Audio Signals

Convergence of latent mixing measures in nonparametric and mixture models

IMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL

­­Image Segmentation using Gaussian Mixture Model

عنوان ژورنال:

اشتراک گذاری

Image Segmentation using Gaussian Mixture Model